Model-based Clustering with Soft Balancing
نویسندگان
چکیده
Balanced clustering algorithms can be useful in a variety of applications and have recently attracted increasing research interest. Most recent work, however, addressed only hard balancing by constraining each cluster to have equal or a certain minimum number of data objects. This paper provides a soft balancing strategy built upon a soft mixtureof-models clustering framework. This strategy constrains the sum of posterior probabilities of object membership for each cluster to be equal and thus balances the expected number of data objects in each cluster. We first derive soft model-based clustering from an information-theoretic viewpoint and then show that the proposed balanced clustering can be parameterized by a temperature parameter that controls the softness of clustering as well as that of balancing. As the temperature decreases, the resulting partitioning becomes more and more balanced. In the limit, when temperature becomes zero, the balancing becomes hard and the actual partitioning becomes perfectly balanced. The effectiveness of the proposed soft balanced clustering algorithm is demonstrated on both synthetic and real text data.
منابع مشابه
Development of Model and Algorithm for Depot Balancing Multi-Depot Vehicle Scheduling Problem Considering Depot Balancing
The main of multi-depot vehicle scheduling problem (MDVSP) is to schedule the timetabled trips using limited resources, optimally. The problem is very important in the management of the transportation systems. One of the most useful ways to better manage these systems is to consider the real conditions including depot balancing constraints. To normalize the number of vehicles departed from each...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملAsthma Control Level Assessment by Moving from the Current Reactive Care Models into a Preventive Approach based on Fuzzy Clustering and Classification Algorithms
Background and Aim: Asthma is a common and chronic disease of respiratory tracts. The best way to treat Asthma is to control it. Experts of this field suggest the continues monitoring on Asthma symptoms and adjustment of self-care plan with offering the preventive treatment program to have desired control over Asthma. Presenting these plans by the physician is set based on the control level in ...
متن کاملApplication of Soft Computing Methods for the Estimation of Roadheader Performance from Schmidt Hammer Rebound Values
Estimation of roadheader performance is one of the main topics in determining the economics of underground excavation projects. The poor performance estimation of roadheader scan leads to costly contractual claims. In this paper, the application of soft computing methods for data analysis called adaptive neuro-fuzzy inference system- subtractive clustering method (ANFIS-SCM) and artificial neu...
متن کاملBreast Cancer Risk Assessment Using adaptive neuro-fuzzy inference system (ANFIS) and Subtractive Clustering Algorithm
Introduction: The adaptive neuro-fuzzy inference system (ANFIS) is a soft computing model based on neural network precision and fuzzy decision-making advantages, which can highly facilitate diagnostic modeling. In this study we used this model in breast cancer detection. Methodology: A set of 1,508 records on cancerous and non-cancerous participant’s risk factors was used. First,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003